Hybrid Grapheme to Phoneme Conversion forUnlimited

نویسندگان

  • BYEONGCHANG KIM
  • GEUNBAE LEE
  • JONG-HYEOK LEE
  • Byeongchang Kim
  • Geunbae Lee
  • Jong-Hyeok Lee
چکیده

Both dictionary-based and rule-based methods on grapheme-to-phoneme conversion have their own advantages and limitations. For example, a large sized phonetic dictionary and complex morphophonemic rules are required for the dictionary-based method and the LTS(letter to sound) rule-based method itself cannot model the complete morphophonemic constraints. This paper describes a grapheme-to-phoneme conversion method using a dictionary-based and rule-based hybrid method with a phonetic pattern dictionary and CCV (consonant consonant vowel) LTS (letter to sound) rules. The phonetic pattern dictionary, standing for the dictionary-based method, contains entries in the form of a morpheme pattern and its phonetic pattern. The patterns represent candidate phonological changes in left and right boundaries of morphemes. Obviously, the CCV LTS rules stands for the rule-based method. The rules are in charge of grapheme-to-phoneme conversion within morphemes. The conversion method consists of mainly two steps including morpheme to phoneme conversion and morphophonemic connectivity check, and two preprocessing steps including phrase break detection and morpheme normalization. Phrase break detection presumes phrase breaks using the stochastic method on part-of-speech (POS) information. Morpheme normalization is to replace non-Korean symbols with their corresponding standard Korean graphemes. In the morpheme phoneticizing module, each morpheme in the phrase is converted into phonetic patterns by looking it up in the phonetic pattern dictionary. Graphemes within a morpheme are grouped into CCV units and converted into phonemes by the CCV LTS rules. The morphophonemic connectivity table supports grammaticality checking of the two adjacent phonetic morphemes. In experiments with a non-Korean symbol free corpus of 4,973 sentences, we achieved a 99.98% grapheme-to-phoneme conversion performance rate and a 99.0% sentence conversion performance rate. With a news broadcast corpus of 621 sentences, 99.7% of the graphemes and 86.6% of the sentences are correctly converted. The full Korean TTS (Text-to-Speech) system is now being implemented using this conversion method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Approach to Grapheme-Phoneme Conversion

We present a simple and effective approach to the task of grapheme-tophoneme conversion based on a set of manually edited grapheme-phoneme mappings which drives not only the alignment of words and corresponding pronunciations, but also the segmentation of words during model training and application, respectively. The actual conversion is performed with the help of a conditional random field mod...

متن کامل

Rule-based Korean Grapheme to Phoneme Conversion Using Sound Patterns

Grapheme-to-phoneme conversion plays an important role in text-to-speech applications and other fields of computational linguistics. Although Korean uses a phonemic writing system, it must have a grapheme-to-phoneme conversion for speech synthesis because Korean writing system does not always reflect its actual pronunciations. This paper describes a grapheme-to-phoneme conversion method based o...

متن کامل

Unlimited Vocabulary Grapheme to PhonemeConversion with Probabilistic Phrase Break Detection

This paper describes a grapheme-to-phoneme conversion method using phoneme con-nectivity and CCV conversion rules with probabilistic phrase break detection. The method consists of mainly four modules including phrase-break detection, morpheme normalization, morpheme to phoneme conversion and phoneme connectivity check. In the experiments with a test corpus of 210 sentences, we achieved 85% of p...

متن کامل

Solving the Phoneme Conflict in Grapheme-to-Phoneme Conversion Using a Two-Stage Neural Network-Based Approach

To achieve high quality output speech synthesis systems, data-driven grapheme-to-phoneme (G2P) conversion is usually used to generate the phonetic transcription of out-of-vocabulary (OOV) words. To improve the performance of G2P conversion, this paper deals with the problem of conflicting phonemes, where an input grapheme can, in the same context, produce many possible output phonemes at the sa...

متن کامل

Probabilistic Context-Free Grammars for Syllabification and Grapheme-to-Phoneme Conversion

We investigated the applicability of probabilistic context-free grammars to syllabi cation and grapheme-to-phoneme conversion. The results show that the standard probability model of context-free grammars performs very well in predicting syllable boundaries. However, our results indicate that the standard probability model does not solve grapheme-to-phoneme conversion su ciently although, we va...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998